Home Catalogue search

eng

Refine your search:
- Keyword
- Creator / Publisher
- Year:
  - 2021 (10)
  - 2020 (7)
  - 2019 (2)
  - 2018 (2)
  - 2017 (2)
  - 2016 (2)
  - 2014 (1)
  - 2009 (3)
- Medium
- Type
- BLLDB-Access

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2

Hits 1 – 20 of 29

1	The effect of domain and diacritics in Yorùbá-English neural machine translation
	Adelani, David,; Ruiter, Dana; Alabi, Jesujoba,; Adebonojo, Damilola; Ayeni, Adesina; Adeyemi, Mofetoluwa; Awokoya, Ayodele; Espana-Bonet, Cristina
	In: 18th Biennial Machine Translation Summit ; https://hal.inria.fr/hal-03350967 ; 18th Biennial Machine Translation Summit, Aug 2021, Orlando, United States (2021)
	Abstract: International audience ; Massively multilingual machine translation (MT) has shown impressive capabilities, including zero and few-shot translation between low-resource language pairs. However, these models are often evaluated on high-resource languages with the assumption that they generalize to low-resource ones. The difficulty of evaluating MT models on low-resource pairs is often due to lack of standardized evaluation datasets. In this paper, we present MENYO-20k, the first multi-domain parallel corpus with a special focus on clean orthography for Yorùbá-English with standardized train-test splits for benchmarking. We provide several neural MT benchmarks and compare them to the performance of popular pre-trained (massively multilingual) MT models both for the heterogeneous test set and its subdomains. Since these pre-trained models use huge amounts of data with uncertain quality, we also analyze the effect of diacritics, a major characteristic of Yorùbá, in the training data. We investigate how and when this training condition affects the final quality and intelligibility of a translation. Our models outperform massively multilingual models such as Google (+8.7 BLEU) and Facebook M2M (+9.1 BLEU) when translating to Yorùbá, setting a high quality benchmark for future research.
	Keyword: [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
	URL: https://hal.inria.fr/hal-03350967 https://hal.inria.fr/hal-03350967/document https://hal.inria.fr/hal-03350967/file/adelani_MTSummit2021.pdf
	BASE
	Hide details

2	Europarl Direct Translationese Dataset ...
	Amponsah-Kaakyire, Kwabena; Pylypenko, Daria; España-Bonet, Cristina. - : Zenodo, 2021
	BASE
	Show details

3	Europarl Direct Translationese Dataset ...
	Amponsah-Kaakyire, Kwabena; Pylypenko, Daria; España-Bonet, Cristina. - : Zenodo, 2021
	BASE
	Show details

4	Europarl Direct Translationese Dataset ...
	Amponsah-Kaakyire, Kwabena; Pylypenko, Daria; España-Bonet, Cristina. - : Zenodo, 2021
	BASE
	Show details

5	A Data Augmentation Approach for Sign-Language-To-Text Translation In-The-Wild ...
	Nunnari, Fabrizio; España-Bonet, Cristina; Avramidis, Eleftherios. - : Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021
	BASE
	Show details

6	The Effect of Domain and Diacritics in Yorùbá-English Neural Machine Translation ...
	Adelani, David I.; Ruiter, Dana; Alabi, Jesujoba O.. - : arXiv, 2021
	BASE
	Show details

7	Integrating Unsupervised Data Generation into Self-Supervised Neural Machine Translation for Low-Resource Languages ...
	Ruiter, Dana; Klakow, Dietrich; van Genabith, Josef. - : arXiv, 2021
	BASE
	Show details

8	Comparing Feature-Engineering and Feature-Learning Approaches for Multilingual Translationese Classification ...
	Pylypenko, Daria; Amponsah-Kaakyire, Kwabena; Chowdhury, Koel Dutta. - : arXiv, 2021
	BASE
	Show details

9	Comparing Feature-Engineering and Feature-Learning Approaches for Multilingual Translationese Classification ...
	The 2021 Conference on Empirical Methods in Natural Language Processing 2021; Amponsah-Kaakyire, Kwabena; Dutta Chowdhury, Koel. - : Underline Science Inc., 2021
	BASE
	Show details

10	Automatic classification of human translation and machine translation : a study from the perspective of lexical diversity
	Fu, Yingxue; Nederhof, Mark Jan. - : Linkoping University Electronic Press, 2021
	BASE
	Show details

11	Tailoring and Evaluating the Wikipedia for in-Domain Comparable Corpora Extraction ...
	España-Bonet, Cristina; Barrón-Cedeño, Alberto; Màrquez, Lluís. - : arXiv, 2020
	BASE
	Show details

12	WTC1.1 (WikiTailor corpus v. 1.1) ...
	España-Bonet, Cristina; Barrón-Cedeño, Alberto; Màrquez, Lluís. - : Zenodo, 2020
	BASE
	Show details

13	MT models for multilingual CLuBS engine (en-de-fr-es) ...
	España-Bonet, Cristina; Henning, Sophie; Ramthun, Roland. - : Zenodo, 2020
	BASE
	Show details

14	WTC1.0 (WikiTailor corpus v. 1.0) ...
	España-Bonet, Cristina; Barrón-Cedeño, Alberto; Màrquez, Lluís. - : Zenodo, 2020
	BASE
	Show details

15	WTC1.1 (WikiTailor corpus v. 1.1) ...
	España-Bonet, Cristina; Barrón-Cedeño, Alberto; Màrquez, Lluís. - : Zenodo, 2020
	BASE
	Show details

16	MT models for multilingual CLuBS engine (en-de-fr-es) ...
	España-Bonet, Cristina; Henning, Sophie; Ramthun, Roland. - : Zenodo, 2020
	BASE
	Show details

17	Multilingual and Interlingual Semantic Representations for Natural Language Processing: A Brief Introduction
	Costa-jussà, Marta R.; España-Bonet, Cristina; Fung, Pascale...
	In: Computational Linguistics, Vol 46, Iss 2, Pp 249-255 (2020) (2020)
	BASE
	Show details

18	GeBioToolkit: Automatic Extraction of Gender-Balanced Multilingual Corpus of Wikipedia Biographies ...
	Costa-jussà, Marta R.; Lin, Pau Li; España-Bonet, Cristina. - : arXiv, 2019
	BASE
	Show details

19	Massive vs. Curated Word Embeddings for Low-Resourced Languages. The Case of Yorùbá and Twi ...
	Alabi, Jesujoba O.; Amponsah-Kaakyire, Kwabena; Adelani, David I.. - : arXiv, 2019
	BASE
	Show details

20	Query Translation for Cross-lingual Search in the Academic Search Engine PubPsych ...
	España-Bonet, Cristina; Stiller, Juliane; Ramthun, Roland. - : PsychArchives, 2018
	BASE
	Show details

Page: 1 2

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern